nd608 - Project Personalized Real Estate Agent¶

In [ ]:
# Load environment variables from a .env file. Alternatively you can
# manually set the value of OPENAI_API_KEY on this cell.

from os import environ

try:
    from dotenv import load_dotenv
    load_dotenv()
except ModuleNotFoundError:
    pass

if "OPENAI_API_KEY" not in environ:
    environ["OPENAI_API_KEY"] = "your-openai-api-key"
In [ ]:
import pickle

from io import BytesIO
from pathlib import Path
from textwrap import dedent

import openai
import requests
import torch

from IPython.display import display, Markdown
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from PIL import Image
from pydantic import BaseModel, Field, NonNegativeFloat, NonNegativeInt
from transformers import AutoTokenizer, CLIPProcessor, CLIPModel

from db import get_db, init_db, get_listings_table
from models import RealEstateListingLanceRecord
In [ ]:
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)
In [ ]:
device = "cpu"

if torch.cuda.is_available():
    device = "cpu"
elif torch.backends.mps.is_available():
    device = "mps"

Generating Real Estate Listings¶

The purpose of this document is to generate synthetic real estate listings using OpenAI's generative AI APIs. We'll also create a LanceDB attaching embeddings to the generated content.

We'll use LangChain's PromptTemplate, PydanticOutputParser to generate the synthetic real estate listings in a structured format to make it easier to store the information on a table. We'll use the format suggested on the project's instruction:

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
In [ ]:
class RealEstateListingModelOutput(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhod")
    price: NonNegativeInt = Field(description="List price of the property")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms of the property")
    bathrooms: NonNegativeFloat | NonNegativeInt = Field(description="Number of bathroom of the property")
    has_solar_panels: bool = Field(description="Whether the property has solar panels fitted")
    description: str = Field(description="Description of the property")
    neighborhood_description: str = Field(description="Description of the neighborhood")


class RealEstateListingsModelOutput(BaseModel):
    listings: list[RealEstateListingModelOutput]


class RealEstateListing(RealEstateListingModelOutput):
    image_bytes: bytes | None = Field(description="Contents of the generated image", default=None)
    
    @property
    def image_as_pil(self):
        return Image.open(BytesIO(self.image_bytes))
In [ ]:
parser = PydanticOutputParser(pydantic_object=RealEstateListingsModelOutput)
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"RealEstateListingModelOutput": {"properties": {"neighborhood": {"description": "Name of the neighborhod", "title": "Neighborhood", "type": "string"}, "price": {"description": "List price of the property", "minimum": 0, "title": "Price", "type": "integer"}, "bedrooms": {"description": "Number of bedrooms of the property", "minimum": 0, "title": "Bedrooms", "type": "integer"}, "bathrooms": {"anyOf": [{"minimum": 0.0, "type": "number"}, {"minimum": 0, "type": "integer"}], "description": "Number of bathroom of the property", "title": "Bathrooms"}, "has_solar_panels": {"description": "Whether the property has solar panels fitted", "title": "Has Solar Panels", "type": "boolean"}, "description": {"description": "Description of the property", "title": "Description", "type": "string"}, "neighborhood_description": {"description": "Description of the neighborhood", "title": "Neighborhood Description", "type": "string"}}, "required": ["neighborhood", "price", "bedrooms", "bathrooms", "has_solar_panels", "description", "neighborhood_description"], "title": "RealEstateListingModelOutput", "type": "object"}}, "properties": {"listings": {"items": {"$ref": "#/$defs/RealEstateListingModelOutput"}, "title": "Listings", "type": "array"}}, "required": ["listings"]}
```
In [ ]:
prompt = PromptTemplate(
    template=dedent("""\
        You are a writer and a real estate expert with extensive
        knowledge of the terminolgy and a capable of writing lengthy,
        easy to read and factual descriptions of properties.

        Generate {num_listings} listings of imaginary real estate
        properties. The description of the property should include detailed
        mentions of the property's features like the number of bedrooms and
        bathrooms. The description of the property should include details
        about the exterior as well. The description of the property should
        contain 3 sentences. Include both upper-middle class and lower
        income neighborhoods. The neighborhood income level should be
        consistent with the property. Include a few fixer-upper properties.
    """) + "\n{format_instructions}",
    input_variables=["num_listings"],
    partial_variables={
        "format_instructions": parser.get_format_instructions
    },
)
In [ ]:
print(prompt.format(num_listings=15))
You are a writer and a real estate expert with extensive
knowledge of the terminolgy and a capable of writing lengthy,
easy to read and factual descriptions of properties.

Generate 15 listings of imaginary real estate
properties. The description of the property should include detailed
mentions of the property's features like the number of bedrooms and
bathrooms. The description of the property should include details
about the exterior as well. The description of the property should
contain 3 sentences. Include both upper-middle class and lower
income neighborhoods. The neighborhood income level should be
consistent with the property. Include a few fixer-upper properties.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"RealEstateListingModelOutput": {"properties": {"neighborhood": {"description": "Name of the neighborhod", "title": "Neighborhood", "type": "string"}, "price": {"description": "List price of the property", "minimum": 0, "title": "Price", "type": "integer"}, "bedrooms": {"description": "Number of bedrooms of the property", "minimum": 0, "title": "Bedrooms", "type": "integer"}, "bathrooms": {"anyOf": [{"minimum": 0.0, "type": "number"}, {"minimum": 0, "type": "integer"}], "description": "Number of bathroom of the property", "title": "Bathrooms"}, "has_solar_panels": {"description": "Whether the property has solar panels fitted", "title": "Has Solar Panels", "type": "boolean"}, "description": {"description": "Description of the property", "title": "Description", "type": "string"}, "neighborhood_description": {"description": "Description of the neighborhood", "title": "Neighborhood Description", "type": "string"}}, "required": ["neighborhood", "price", "bedrooms", "bathrooms", "has_solar_panels", "description", "neighborhood_description"], "title": "RealEstateListingModelOutput", "type": "object"}}, "properties": {"listings": {"items": {"$ref": "#/$defs/RealEstateListingModelOutput"}, "title": "Listings", "type": "array"}}, "required": ["listings"]}
```

We'll use OpenAI's gpt-4-turbo model as it has higher chances of following the instructions.

In [ ]:
llm = ChatOpenAI(
    model_name="gpt-4-turbo",
    temperature=0.2,  # Sacrificing reproducibility to give the model some leeway
    max_tokens=4000
)

We pipe the model's completion to the LangChain output parser to generate the Pydantic models of the listings.

In [ ]:
parsed_model_response = (llm | parser).invoke(prompt.format(num_listings=15))
parsed_model_response.listings
Out[ ]:
[RealEstateListingModelOutput(neighborhood='Maplewood Estates', price=450000, bedrooms=4, bathrooms=3, has_solar_panels=True, description='This charming 4-bedroom, 3-bathroom home in Maplewood Estates features a modern kitchen with stainless steel appliances and granite countertops. The spacious backyard is perfect for entertaining with its large patio and mature landscaping. Solar panels and energy-efficient windows ensure low utility bills.', neighborhood_description='Maplewood Estates is a family-friendly neighborhood known for its excellent schools and community parks.'),
 RealEstateListingModelOutput(neighborhood='Cedar Grove', price=320000, bedrooms=3, bathrooms=2, has_solar_panels=False, description='Nestled in the quiet Cedar Grove neighborhood, this 3-bedroom, 2-bathroom home boasts a cozy fireplace and hardwood floors throughout. The property includes a two-car garage and a private, fenced backyard. Ideal for first-time homebuyers or small families.', neighborhood_description='Cedar Grove is a peaceful, suburban community with convenient access to local shopping and dining.'),
 RealEstateListingModelOutput(neighborhood='Downtown Lofts', price=550000, bedrooms=2, bathrooms=2, has_solar_panels=False, description='Experience urban living in this luxurious 2-bedroom, 2-bathroom loft located in the heart of downtown. The open floor plan features high ceilings, exposed brick walls, and large windows that flood the space with natural light. Just steps away from vibrant nightlife, fine dining, and boutique shopping.', neighborhood_description='Downtown Lofts is a bustling urban area popular among young professionals and artists.'),
 RealEstateListingModelOutput(neighborhood='Pinehurst Village', price=275000, bedrooms=3, bathrooms=1.5, has_solar_panels=False, description='A perfect starter home, this 3-bedroom, 1.5-bathroom house in Pinehurst Village offers a comfortable living space at an affordable price. The home features a large living room, a kitchen with ample storage, and a spacious backyard. Some cosmetic updates needed, making it a great opportunity for customization.', neighborhood_description='Pinehurst Village is a budget-friendly neighborhood with a close-knit community feel.'),
 RealEstateListingModelOutput(neighborhood='Elmwood Park', price=750000, bedrooms=5, bathrooms=4, has_solar_panels=True, description="Luxury meets sustainability in this stunning 5-bedroom, 4-bathroom home in prestigious Elmwood Park. Features include a gourmet kitchen, a master suite with a spa-like bathroom, and a beautifully landscaped garden. Energy-efficient solar panels and a modern HVAC system add to the home's appeal.", neighborhood_description='Elmwood Park is an upscale neighborhood known for its luxurious homes and manicured lawns.'),
 RealEstateListingModelOutput(neighborhood='Riverbend', price=210000, bedrooms=2, bathrooms=1, has_solar_panels=False, description='Cozy and inviting, this 2-bedroom, 1-bathroom cottage in Riverbend is ideal for those looking to downsize. The home features a quaint front porch, an eat-in kitchen, and a manageable yard. A peaceful retreat from the hustle and bustle of city life.', neighborhood_description='Riverbend is a serene, riverside community popular with retirees and small families.'),
 RealEstateListingModelOutput(neighborhood='Sunnydale', price=190000, bedrooms=3, bathrooms=2, has_solar_panels=False, description='This 3-bedroom, 2-bathroom fixer-upper in Sunnydale offers a fantastic opportunity for investors or DIY enthusiasts. The house requires significant renovations but is priced to sell and holds great potential. Located close to schools and public transport.', neighborhood_description='Sunnydale is a lower-income area undergoing revitalization, with increasing investment and community development projects.'),
 RealEstateListingModelOutput(neighborhood='Willow Creek', price=680000, bedrooms=4, bathrooms=3.5, has_solar_panels=True, description='This eco-friendly 4-bedroom, 3.5-bathroom home in Willow Creek combines luxury with sustainability. It features a state-of-the-art kitchen, a home theater room, and a deck overlooking the creek. Solar panels and a rainwater collection system minimize its environmental footprint.', neighborhood_description='Willow Creek is an environmentally conscious community with a focus on sustainable living.'),
 RealEstateListingModelOutput(neighborhood='Old Town', price=500000, bedrooms=3, bathrooms=2, has_solar_panels=False, description='Located in the historic Old Town district, this 3-bedroom, 2-bathroom Victorian home exudes old-world charm with its ornate woodwork and stained glass windows. The property includes a secret garden and a detached studio. A rare find, perfect for those who appreciate historic details.', neighborhood_description='Old Town is a historic neighborhood known for its well-preserved Victorian architecture and vibrant community events.'),
 RealEstateListingModelOutput(neighborhood='Lakeside', price=300000, bedrooms=2, bathrooms=1, has_solar_panels=False, description='Enjoy breathtaking lake views from this cozy 2-bedroom, 1-bathroom bungalow in Lakeside. The home features a wrap-around deck, perfect for relaxing and entertaining. A peaceful, scenic getaway ideal for weekend retreats or a tranquil full-time residence.', neighborhood_description='Lakeside is a quiet, picturesque community nestled along the shores of a pristine lake.'),
 RealEstateListingModelOutput(neighborhood='Greenwood Heights', price=220000, bedrooms=3, bathrooms=1.5, has_solar_panels=False, description='This affordable 3-bedroom, 1.5-bathroom home in Greenwood Heights is a great option for first-time homebuyers. It features a functional layout, a new roof, and a spacious backyard. Close to local amenities and public transportation.', neighborhood_description='Greenwood Heights is a diverse, up-and-coming neighborhood known for its community spirit and affordable housing options.'),
 RealEstateListingModelOutput(neighborhood='Silver Lake', price=850000, bedrooms=5, bathrooms=4.5, has_solar_panels=True, description='This exquisite 5-bedroom, 4.5-bathroom residence in Silver Lake offers unparalleled luxury and comfort. With its custom design, high-end finishes, and a private dock on the lake, this home is a haven of elegance. Solar panels and smart home technology ensure efficiency and convenience.', neighborhood_description='Silver Lake is an exclusive community known for its luxurious homes and stunning natural landscapes.'),
 RealEstateListingModelOutput(neighborhood='Pebble Beach', price=165000, bedrooms=2, bathrooms=1, has_solar_panels=False, description='A charming 2-bedroom, 1-bathroom cottage in Pebble Beach, perfect for those seeking a simple, low-maintenance lifestyle. The home needs some TLC but offers great potential with its compact design and proximity to the beach. An excellent opportunity for a summer home or investment property.', neighborhood_description='Pebble Beach is a quiet seaside community popular with vacationers and retirees.'),
 RealEstateListingModelOutput(neighborhood='Highland Park', price=410000, bedrooms=4, bathrooms=2.5, has_solar_panels=False, description='This spacious 4-bedroom, 2.5-bathroom home in Highland Park features a modern kitchen with updated appliances, a large family room, and a beautifully landscaped yard. Perfect for growing families or those who love to entertain. Located in a desirable school district and close to parks.', neighborhood_description='Highland Park is a vibrant, family-oriented community known for its excellent schools and active neighborhood association.'),
 RealEstateListingModelOutput(neighborhood='Meadowlands', price=235000, bedrooms=3, bathrooms=2, has_solar_panels=False, description='This well-maintained 3-bedroom, 2-bathroom home in Meadowlands is an ideal choice for those looking for quality and affordability. The home features a large living area, a master suite with a walk-in closet, and a covered patio. A quiet, friendly neighborhood perfect for families.', neighborhood_description='Meadowlands is a peaceful, suburban community with easy access to shopping centers and major highways.')]

Let's save the generated real estate listings to avoid hitting the model multiple times.

In [ ]:
with open(data_dir / "listings.pickle", "wb") as f:
    pickle.dump(parsed_model_response.listings, f)

Read the listings back.

In [ ]:
with open(data_dir / "listings.pickle", "rb") as f:
    listings = pickle.load(f)

Generating Real Estate Listings Images¶

We want to increase the usability of our recommendation app, so we'll use OpenAI's DALL-e. We're adding specific hints to the prompt to generate photorealistic images.

In [ ]:
client = openai.OpenAI()
In [ ]:
listings_with_image = []

for i, listing in enumerate(listings):
    display(Markdown(f"Generating image for listing with description: _'{listing.description}'_...."))

    dalle2_response = client.images.generate(
        model="dall-e-2",
        prompt=f"Photo of {listing.description}. 1/100s, ISO 100, Daylight.",
        size="512x512",
        quality="standard",
        n=1,
    )

    response = requests.get(dalle2_response.data[0].url)
    response.raise_for_status()

    listings_with_image.append(
        RealEstateListing(
            **listing.model_dump(),
            image_bytes=response.content,
        )
    )

    image = Image.open(BytesIO(response.content))

    display(image)

Generating image for listing with description: 'This charming 4-bedroom, 3-bathroom home in Maplewood Estates features a modern kitchen with stainless steel appliances and granite countertops. The spacious backyard is perfect for entertaining with its large patio and mature landscaping. Solar panels and energy-efficient windows ensure low utility bills.'....

No description has been provided for this image

Generating image for listing with description: 'Nestled in the quiet Cedar Grove neighborhood, this 3-bedroom, 2-bathroom home boasts a cozy fireplace and hardwood floors throughout. The property includes a two-car garage and a private, fenced backyard. Ideal for first-time homebuyers or small families.'....

No description has been provided for this image

Generating image for listing with description: 'Experience urban living in this luxurious 2-bedroom, 2-bathroom loft located in the heart of downtown. The open floor plan features high ceilings, exposed brick walls, and large windows that flood the space with natural light. Just steps away from vibrant nightlife, fine dining, and boutique shopping.'....

No description has been provided for this image

Generating image for listing with description: 'A perfect starter home, this 3-bedroom, 1.5-bathroom house in Pinehurst Village offers a comfortable living space at an affordable price. The home features a large living room, a kitchen with ample storage, and a spacious backyard. Some cosmetic updates needed, making it a great opportunity for customization.'....

No description has been provided for this image

Generating image for listing with description: 'Luxury meets sustainability in this stunning 5-bedroom, 4-bathroom home in prestigious Elmwood Park. Features include a gourmet kitchen, a master suite with a spa-like bathroom, and a beautifully landscaped garden. Energy-efficient solar panels and a modern HVAC system add to the home's appeal.'....

No description has been provided for this image

Generating image for listing with description: 'Cozy and inviting, this 2-bedroom, 1-bathroom cottage in Riverbend is ideal for those looking to downsize. The home features a quaint front porch, an eat-in kitchen, and a manageable yard. A peaceful retreat from the hustle and bustle of city life.'....

No description has been provided for this image

Generating image for listing with description: 'This 3-bedroom, 2-bathroom fixer-upper in Sunnydale offers a fantastic opportunity for investors or DIY enthusiasts. The house requires significant renovations but is priced to sell and holds great potential. Located close to schools and public transport.'....

No description has been provided for this image

Generating image for listing with description: 'This eco-friendly 4-bedroom, 3.5-bathroom home in Willow Creek combines luxury with sustainability. It features a state-of-the-art kitchen, a home theater room, and a deck overlooking the creek. Solar panels and a rainwater collection system minimize its environmental footprint.'....

No description has been provided for this image

Generating image for listing with description: 'Located in the historic Old Town district, this 3-bedroom, 2-bathroom Victorian home exudes old-world charm with its ornate woodwork and stained glass windows. The property includes a secret garden and a detached studio. A rare find, perfect for those who appreciate historic details.'....

No description has been provided for this image

Generating image for listing with description: 'Enjoy breathtaking lake views from this cozy 2-bedroom, 1-bathroom bungalow in Lakeside. The home features a wrap-around deck, perfect for relaxing and entertaining. A peaceful, scenic getaway ideal for weekend retreats or a tranquil full-time residence.'....

No description has been provided for this image

Generating image for listing with description: 'This affordable 3-bedroom, 1.5-bathroom home in Greenwood Heights is a great option for first-time homebuyers. It features a functional layout, a new roof, and a spacious backyard. Close to local amenities and public transportation.'....

No description has been provided for this image

Generating image for listing with description: 'This exquisite 5-bedroom, 4.5-bathroom residence in Silver Lake offers unparalleled luxury and comfort. With its custom design, high-end finishes, and a private dock on the lake, this home is a haven of elegance. Solar panels and smart home technology ensure efficiency and convenience.'....

No description has been provided for this image

Generating image for listing with description: 'A charming 2-bedroom, 1-bathroom cottage in Pebble Beach, perfect for those seeking a simple, low-maintenance lifestyle. The home needs some TLC but offers great potential with its compact design and proximity to the beach. An excellent opportunity for a summer home or investment property.'....

No description has been provided for this image

Generating image for listing with description: 'This spacious 4-bedroom, 2.5-bathroom home in Highland Park features a modern kitchen with updated appliances, a large family room, and a beautifully landscaped yard. Perfect for growing families or those who love to entertain. Located in a desirable school district and close to parks.'....

No description has been provided for this image

Generating image for listing with description: 'This well-maintained 3-bedroom, 2-bathroom home in Meadowlands is an ideal choice for those looking for quality and affordability. The home features a large living area, a master suite with a walk-in closet, and a covered patio. A quiet, friendly neighborhood perfect for families.'....

No description has been provided for this image

We save the results one more time, to avoid hitting the model multiple times.

In [ ]:
with open(data_dir / "listings_with_image.pickle", "wb") as f:
    pickle.dump(listings_with_image, f)

Read the listings back

In [ ]:
with open(data_dir / "listings_with_image.pickle", "rb") as f:
    listings_with_image = pickle.load(f)

Storing Listings in a Vector Database¶

Vector Database Setup¶

We initialize the vector database, re-creating We create the listings table using the RealEstateListingLanceRecord schema from the models module.

In [ ]:
db = get_db()
init_db(db)
In [ ]:
table = get_listings_table(db)

Generating and Storing Embeddings¶

We're going to use HuggingFace's CLIP models to generate embeddings for the listing and image combination.

In [ ]:
clip_model = "openai/clip-vit-large-patch14"

model = CLIPModel.from_pretrained(clip_model).to(device)
processor = CLIPProcessor.from_pretrained(clip_model)
tokenizer = AutoTokenizer.from_pretrained(clip_model)
In [ ]:
def get_listing_embeddings(listing: RealEstateListing) -> torch.Tensor:
    processor_output = processor(
        text=[listing.description + listing.neighborhood_description],
        images=listing.image_as_pil,
        return_tensors="pt",
        padding=True,
        truncation=True
    )
    image = processor_output["pixel_values"].to(device)
    image_embeddings = model.get_image_features(image)

    return image_embeddings[0].cpu()
In [ ]:
table.add([
    RealEstateListingLanceRecord(
        **listing.model_dump(),
        vector=get_listing_embeddings(listing).detach().numpy()
    )
    for listing in listings_with_image
])

Querying the Vector Database¶

In [ ]:
inputs = tokenizer("Affordable house with backyard", padding=True, truncation=True, return_tensors="pt").to(device)
text_features = model.get_text_features(**inputs)[0].cpu().detach().numpy()
In [ ]:
Markdown(
    "\n".join(
        f"* _{listing.description}_" for listing in table.search(text_features).limit(3).to_pydantic(RealEstateListingLanceRecord)
    )
)
Out[ ]:
  • This well-maintained 3-bedroom, 2-bathroom home in Meadowlands is an ideal choice for those looking for quality and affordability. The home features a large living area, a master suite with a walk-in closet, and a covered patio. A quiet, friendly neighborhood perfect for families.
  • This exquisite 5-bedroom, 4.5-bathroom residence in Silver Lake offers unparalleled luxury and comfort. With its custom design, high-end finishes, and a private dock on the lake, this home is a haven of elegance. Solar panels and smart home technology ensure efficiency and convenience.
  • This affordable 3-bedroom, 1.5-bathroom home in Greenwood Heights is a great option for first-time homebuyers. It features a functional layout, a new roof, and a spacious backyard. Close to local amenities and public transportation.